Skip to content

Comments

Add LiteRT-LM engine support for Android (.litertlm models)#176

Merged
DenisovAV merged 5 commits intomainfrom
feature/android-litertlm
Jan 24, 2026
Merged

Add LiteRT-LM engine support for Android (.litertlm models)#176
DenisovAV merged 5 commits intomainfrom
feature/android-litertlm

Conversation

@DenisovAV
Copy link
Owner

@DenisovAV DenisovAV commented Jan 23, 2026

Summary

  • Add LiteRT-LM SDK integration for Android platform (v0.9.0-alpha01)
  • Implement Strategy Pattern for engine abstraction (InferenceEngine / InferenceSession)
  • Support both .task (MediaPipe) and .litertlm (LiteRT-LM) model formats
  • Add multimodal support (text + image) for LiteRT-LM models
  • Add Gemma 3 Nano 2B and 4B LiteRT-LM model options in example app
  • Refactor PreferredBackend enum: add NPU, remove unsupported SDK values

Architecture

InferenceEngine (interface)
├── MediaPipeEngine (.task files)
└── LiteRtLmEngine (.litertlm files)

InferenceSession (interface)  
├── MediaPipeSession
└── LiteRtLmSession

EngineFactory.createFromModelPath() automatically selects the correct engine based on file extension.

PreferredBackend Changes

Before After
unknown ❌ removed
cpu ✅ cpu
gpu ✅ gpu
gpuFloat16 ❌ removed (not in SDK)
gpuMixed ❌ removed (not in SDK)
gpuFull ❌ removed (not in SDK)
tpu ❌ removed (not in SDK)
✅ npu (LiteRT-LM only)

NPU Support:

  • LiteRT-LM: Full NPU support (Google Tensor, Qualcomm)
  • MediaPipe: NPU not supported (fallback to default)

Key Files

  • engines/InferenceEngine.kt - Engine abstraction
  • engines/InferenceSession.kt - Session abstraction
  • engines/EngineFactory.kt - Factory with auto-detection
  • engines/litertlm/LiteRtLmEngine.kt - LiteRT-LM implementation
  • engines/litertlm/LiteRtLmSession.kt - LiteRT-LM session with chunk buffering
  • engines/mediapipe/MediaPipeEngine.kt - MediaPipe wrapper
  • pigeon.dart - PreferredBackend enum definition

Known Issues (for future PRs)

  • Race condition in session access (documented in code review)
  • No cancellation support in LiteRT-LM SDK 0.9.x
  • Token counting is estimated (~4 chars/token)

Implement Strategy pattern for inference engines with two backends:
- MediaPipe (existing .task files)
- LiteRT-LM (new .litertlm files with multimodal support)

Key changes:
- Add InferenceEngine interface with Engine/Session abstractions
- Add EngineFactory for automatic engine selection based on file extension
- Implement LiteRtLmEngine with visionBackend for multimodal models
- Implement LiteRtLmSession with chunk buffering for MediaPipe compatibility
- Add thread-safety (synchronized locks) in FlutterGemmaPlugin
- Add LiteRT-LM SDK dependency (0.9.0-alpha01)
- Add gemma3n LiteRT-LM model options in example app
- Add unit tests for engines

Tested with Gemma 3 Nano E2B multimodal (text + image) on Pixel 8.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for LiteRT-LM models (.litertlm files) to the Flutter Gemma plugin by introducing a Strategy Pattern-based engine abstraction layer. The PR refactors the existing MediaPipe inference code into adapters and adds a new LiteRT-LM engine implementation alongside it.

Changes:

  • Introduces InferenceEngine and InferenceSession abstractions with MediaPipe and LiteRT-LM implementations
  • Adds EngineFactory for automatic engine selection based on model file extension
  • Updates FlutterGemmaPlugin to use the new abstraction layer with improved thread safety
  • Adds two new Gemma 3 Nano model variants (2B and 4B) using LiteRT-LM format in the example app

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/InferenceEngine.kt Core engine abstraction interface defining initialization, session creation, and capabilities
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/InferenceSession.kt Session abstraction interface for text/image input and response generation
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/EngineConfig.kt Configuration data classes and SharedFlow factory for both engines
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/EngineFactory.kt Factory for automatic engine selection based on file extension
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/mediapipe/MediaPipeEngine.kt Adapter wrapping existing MediaPipe LlmInference implementation
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/mediapipe/MediaPipeSession.kt Adapter wrapping existing MediaPipe session implementation
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmEngine.kt New LiteRT-LM engine implementation with caching support
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmSession.kt New LiteRT-LM session with chunk buffering for MediaPipe compatibility
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt Updated to use engine abstraction with enhanced synchronization and cleanup
android/src/test/kotlin/dev/flutterberlin/flutter_gemma/engines/EngineFactoryTest.kt Comprehensive tests for factory engine selection logic
android/src/test/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmEngineTest.kt Unit tests for LiteRT-LM engine capabilities and lifecycle
android/src/test/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmSessionTest.kt Unit tests for LiteRT-LM session including thread safety and token estimation
example/lib/models/model.dart Adds Gemma 3 Nano 2B and 4B LiteRT-LM model variants, fixes local model filename
android/build.gradle Adds LiteRT-LM SDK dependency (v0.9.0-alpha01)
Comments suppressed due to low confidence (3)

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:142

  • Resource leak on initialization failure: If newEngine.initialize() at line 129 throws an exception, the newEngine instance created at line 128 is not closed. This could leak resources if the engine constructor allocated any resources before initialization failed. Consider wrapping the initialize call in a try-catch that closes the engine on failure before rethrowing.
        // Create and initialize new engine BEFORE clearing old state
        // This ensures we don't leave state inconsistent on failure
        val newEngine = EngineFactory.createFromModelPath(modelPath, context)
        newEngine.initialize(config)

        // Only now clear old state and swap in new engine (thread-safe)
        synchronized(engineLock) {
          session?.close()
          session = null
          engine?.close()
          engine = newEngine
        }

        callback(Result.success(Unit))
      } catch (e: Exception) {
        callback(Result.failure(e))
      }

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:216

  • Race condition: Session access is not properly synchronized. Lines 209-211 read the session field outside of synchronization, but the session can be nullified by closeSession() (line 198) or createSession() (lines 184-185) concurrently. This could cause null pointer exceptions or use-after-close errors.

The same issue exists in addQueryChunk (222-224), addImage (235-237), generateResponse (248-250), generateResponseAsync (261-263), and stopGeneration (274-276).

Solution: Wrap the session access in synchronized(engineLock) to ensure consistent access across all methods that read or write to the session field.

  override fun sizeInTokens(prompt: String, callback: (Result<Long>) -> Unit) {
    scope.launch {
      try {
        val currentSession = session
          ?: throw IllegalStateException("Session not created")
        val size = currentSession.sizeInTokens(prompt)
        callback(Result.success(size.toLong()))
      } catch (e: Exception) {
        callback(Result.failure(e))
      }
    }

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:313

  • Missing synchronization on engine access: The engine field is accessed inside synchronized(engineLock) at line 290, but the streamJob creation at line 292 happens inside that scope while accessing engine flows (lines 294, 306). If the engine is closed or replaced between checking it at line 290 and accessing its flows, this could result in flows from a closed/different engine being collected.

Additionally, streamJob is modified at line 292 without synchronization, but is also accessed in onCancel() at line 317 without synchronization, which could cause race conditions.

  override fun onListen(arguments: Any?, events: EventChannel.EventSink?) {
    // Cancel previous stream collection to prevent orphaned coroutines
    streamJob?.cancel()
    eventSink = events

    synchronized(engineLock) {
      val currentEngine = engine ?: return

      streamJob = scope.launch {
        launch {
          currentEngine.partialResults.collect { (text, done) ->
            val payload = mapOf("partialResult" to text, "done" to done)
            withContext(Dispatchers.Main) {
              events?.success(payload)
              if (done) {
                events?.endOfStream()
              }
            }
          }
        }

        launch {
          currentEngine.errors.collect { error ->
            withContext(Dispatchers.Main) {
              events?.error("ERROR", error.message, null)
            }
          }
        }
      }
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Remove non-existent SDK values: unknown, gpuFloat16, gpuMixed, gpuFull, tpu
- Add NPU backend support for LiteRT-LM (Google Tensor, Qualcomm)
- Simplify backend mapping across all engines
- Use Pigeon-generated PreferredBackend directly instead of PreferredBackendEnum
- Update tests for NPU backend
- Fix Copilot review issues: typo in test comment, error message for missing extension
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (5)

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:213

  • Race condition: The session variable is accessed without synchronization. After reading val currentSession = session on line 206, another thread could call closeSession() (line 191-200) and set session to null, causing the subsequent call to currentSession.sizeInTokens(prompt) to operate on a session that has been closed. The same issue exists in addQueryChunk, addImage, generateResponse, generateResponseAsync, and stopGeneration methods.

The session variable should either be marked as @volatile and accessed within synchronized blocks, or the entire method body should be wrapped in synchronized(engineLock) { ... }. Compare with createSession (lines 168-183) and closeSession (lines 192-199) which properly use synchronized(engineLock).

  override fun sizeInTokens(prompt: String, callback: (Result<Long>) -> Unit) {
    scope.launch {
      try {
        val currentSession = session
          ?: throw IllegalStateException("Session not created")
        val size = currentSession.sizeInTokens(prompt)
        callback(Result.success(size.toLong()))
      } catch (e: Exception) {
        callback(Result.failure(e))
      }
    }

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:227

  • Race condition: Same session synchronization issue as in sizeInTokens. The session variable is accessed without proper synchronization, allowing another thread to close the session between reading the reference and using it.
  override fun addQueryChunk(prompt: String, callback: (Result<Unit>) -> Unit) {
    scope.launch {
      try {
        val currentSession = session
          ?: throw IllegalStateException("Session not created")
        currentSession.addQueryChunk(prompt)
        callback(Result.success(Unit))
      } catch (e: Exception) {
        callback(Result.failure(e))
      }
    }
  }

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:240

  • Race condition: Same session synchronization issue. The session variable is accessed without proper synchronization, allowing another thread to close the session between reading the reference and using it.
  override fun addImage(imageBytes: ByteArray, callback: (Result<Unit>) -> Unit) {
    scope.launch {
      try {
        val currentSession = session
          ?: throw IllegalStateException("Session not created")
        currentSession.addImage(imageBytes)
        callback(Result.success(Unit))
      } catch (e: Exception) {
        callback(Result.failure(e))
      }
    }
  }

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:253

  • Race condition: Same session synchronization issue. The session variable is accessed without proper synchronization, allowing another thread to close the session between reading the reference and using it.
  override fun generateResponse(callback: (Result<String>) -> Unit) {
    scope.launch {
      try {
        val currentSession = session
          ?: throw IllegalStateException("Session not created")
        val result = currentSession.generateResponse()
        callback(Result.success(result))
      } catch (e: Exception) {
        callback(Result.failure(e))
      }
    }
  }

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:265

  • Race condition: Same session synchronization issue. The session variable is accessed without proper synchronization, allowing another thread to close the session between reading the reference and using it.
  override fun generateResponseAsync(callback: (Result<Unit>) -> Unit) {
    scope.launch {
      try {
        val currentSession = session
          ?: throw IllegalStateException("Session not created")
        currentSession.generateResponseAsync()
        callback(Result.success(Unit))
      } catch (e: Exception) {
        callback(Result.failure(e))
      }
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Document backend support per platform (Android, iOS, Web, Desktop)
- Clarify that CPU is not supported on Web (MediaPipe limitation)
- Clarify that NPU is Android-only (.litertlm models)
- Add docstrings to PreferredBackend enum in pigeon.dart
- Update proto comments for desktop backend options
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:310

  • Race condition: The engine reference is captured inside a synchronized block but then used outside of it when launching the coroutine. If another thread calls createModel or closeModel between capturing the reference and the coroutine starting to collect from it, the engine could be closed while the coroutine is still collecting from its flows.

Consider capturing the flows (partialResults and errors) inside the synchronized block and collecting from those captured references, or keep the engine reference alive with proper lifecycle management.

    synchronized(engineLock) {
      val currentEngine = engine ?: return

      streamJob = scope.launch {
        launch {
          currentEngine.partialResults.collect { (text, done) ->
            val payload = mapOf("partialResult" to text, "done" to done)
            withContext(Dispatchers.Main) {
              events?.success(payload)
              if (done) {
                events?.endOfStream()
              }
            }
          }
        }

        launch {
          currentEngine.errors.collect { error ->
            withContext(Dispatchers.Main) {
              events?.error("ERROR", error.message, null)
            }
          }
        }
      }
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +245 to +247
val currentSession = session
?: throw IllegalStateException("Session not created")
val result = currentSession.generateResponse()
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition: Session is accessed without synchronization. If another thread calls closeSession, closeModel, or createSession while this method is executing, the captured session reference could point to a closed session, leading to use-after-close errors.

Consider using synchronized(engineLock) when capturing the session reference to ensure thread-safe access.

Copilot uses AI. Check for mistakes.
Comment on lines +271 to +273
val currentSession = session
?: throw IllegalStateException("Session not created")
currentSession.cancelGeneration()
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition: Session is accessed without synchronization. If another thread calls closeSession, closeModel, or createSession while this method is executing, the captured session reference could point to a closed session, leading to use-after-close errors.

Consider using synchronized(engineLock) when capturing the session reference to ensure thread-safe access.

Copilot uses AI. Check for mistakes.
CLAUDE.md Outdated
Comment on lines 546 to 548
├── EngineConfig.kt # Configuration data classes
├── EngineFactory.kt # Factory for engine creation
├── FlowFactory.kt # SharedFlow factory
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation inaccuracy: FlowFactory is documented as a separate file (FlowFactory.kt) but it's actually part of EngineConfig.kt as an object. Update the architecture diagram to reflect that FlowFactory is within EngineConfig.kt, not a separate file.

Suggested change
├── EngineConfig.kt # Configuration data classes
├── EngineFactory.kt # Factory for engine creation
├── FlowFactory.kt # SharedFlow factory
├── EngineConfig.kt # Configuration data classes and FlowFactory object (SharedFlow factory)
├── EngineFactory.kt # Factory for engine creation

Copilot uses AI. Check for mistakes.
@DenisovAV DenisovAV merged commit edd01a8 into main Jan 24, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant